RProtoBuf: Efficient Cross-Language Data Serialization in R
نویسندگان
چکیده
Modern data collection and analysis pipelines often involve a sophisticated mix of applications written in general purpose and specialized programming languages. Many formats commonly used to import and export data between different programs or systems, such as CSV or JSON, are verbose, inefficient, not type-safe, or tied to a specific programming language. Protocol Buffers are a popular method of serializing structured data between applications—while remaining independent of programming languages or operating systems. They offer a unique combination of features, performance, and maturity that seems particulary well suited for data-driven applications and numerical computing. The RProtoBuf package provides a complete interface to Protocol Buffers from the R environment for statistical computing. This paper outlines the general class of data serialization requirements for statistical computing, describes the implementation of the RProtoBuf package, and illustrates its use with example applications in large-scale data collection pipelines and web services.
منابع مشابه
Toward Remote Object Coherence with Compiled Object Serialization for Distributed Computing with XML Web Services
Cross-platform object-level coherence in Web services-based distributed systems and grids requires lossless serialization to ensure programming-language specific objects are safely transmitted, manipulated, and stored. However, Web services development tools often suffer from lossy forms of XML serialization, which diminishes the usefulness of XML Web services as a competitive approach to binar...
متن کاملXML Binary Serialization using Cross-Format Schema Protocol (XFSP) and XML Compression Considerations for Extensible 3D (X3D) Graphics
The NPS Cross-Format Schema Protocol (XFSP) has been developed as a general approach to binary serialization of XML documents. Elements and attributes are replaced via a tokenization scheme which carefully preserves valid XML document structure. XFSP uses XML schema as the basis for determining key document parameters such as legal elements, attributes and data types. Originally motivated by th...
متن کاملDesign and Development of an Efficient Xml Parsing Algorithm: a Review
The extensible markup language XML has become the de facto standard for information representation and interchange on the Internet. As XML becomes widespread it is critical for application developers to understand the operational and performance characteristics of XML processing. The processing of XML documents has been regarded as the performance bottleneck in most systems and applications. XM...
متن کاملA Cross-Language Type System for Information Semantics
We are designing and prototyping a cross-language type system for information semantics. Programming languages’ fundamental concepts of types and inheritance serve as the basis for cross-language representation of information, metadata semantics, and operations. The type system encompasses schemas, practical extraction rules from information published without formally exposed semantics, abstrac...
متن کاملXBS: A Streaming Binary Serializer for High Performance Computing
High performance distributed systems communication requires that data first be serialized into a byte sequence suitable for transmission. A variety of different formats exist for serialization, ranging from XML-based formats to more efficient binary formats. This paper presents the XBS binary serialization library. XBS differs from other binary serializers in that it is a streaming serializer (...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1401.7372 شماره
صفحات -
تاریخ انتشار 2014